Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 130
Filtrar
1.
Nat Aging ; 4(4): 584-594, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38528230

RESUMEN

Multiomics has shown promise in noninvasive risk profiling and early detection of various common diseases. In the present study, in a prospective population-based cohort with ~18 years of e-health record follow-up, we investigated the incremental and combined value of genomic and gut metagenomic risk assessment compared with conventional risk factors for predicting incident coronary artery disease (CAD), type 2 diabetes (T2D), Alzheimer disease and prostate cancer. We found that polygenic risk scores (PRSs) improved prediction over conventional risk factors for all diseases. Gut microbiome scores improved predictive capacity over baseline age for CAD, T2D and prostate cancer. Integrated risk models of PRSs, gut microbiome scores and conventional risk factors achieved the highest predictive performance for all diseases studied compared with models based on conventional risk factors alone. The present study demonstrates that integrated PRSs and gut metagenomic risk models improve the predictive value over conventional risk factors for common chronic diseases.


Asunto(s)
Enfermedad de la Arteria Coronaria , Diabetes Mellitus Tipo 2 , Neoplasias de la Próstata , Masculino , Humanos , Diabetes Mellitus Tipo 2/diagnóstico , Estudios Prospectivos , Factores de Riesgo , Enfermedad de la Arteria Coronaria/genética , Puntuación de Riesgo Genético
2.
Stud Health Technol Inform ; 310: 1454-1455, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269693

RESUMEN

Surveillance of invasive fungal infection (IFI) requires laborious review of multiple sources of clinical information, while applying complex criteria to effectively identify relevant infections. These processes can be automated using artificial intelligence (AI) methodologies, including applying natural language processing (NLP) to clinical reports. However, developing a practically useful automated IFI surveillance tool requires consideration of the implementation context. We employed the Design Thinking Framework (DTF) to focus on the needs of end users of the tool to ensure sustained user engagement and enable its prospective validation. DTF allowed iterative generation of ideas and refinement of the final digital health solution. We believe this approach is key to increasing the likelihood that the solution will be implemented in clinical practice.


Asunto(s)
Trabajo de Parto , Micosis , Embarazo , Femenino , Humanos , Inteligencia Artificial , Salud Digital , Micosis/diagnóstico , Procesamiento de Lenguaje Natural
3.
Stud Health Technol Inform ; 310: 1460-1461, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269696

RESUMEN

Clinical text contains rich patient information and has attracted much research interest in applying Natural Language Processing (NLP) tools to model it. In this study, we quantified and analyzed the textual characteristics of five common clinical note types using multiple measurements, including lexical-level features, semantic content, and grammaticality. We found there exist significant linguistic variations in different clinical note types, while some types tend to be more similar than others.


Asunto(s)
Lingüística , Procesamiento de Lenguaje Natural , Humanos , Semántica
4.
Stud Health Technol Inform ; 310: 169-173, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269787

RESUMEN

It is imperative to build clinician trust to reuse ever-growing amounts of rich clinical data. Utilising a proprietary, structured electronic health record, we address data quality by assessing the plausibility of chiropractors, physical therapists and osteopaths' data entry to help determine if the data is fit for use in predicting outcomes of work-related musculoskeletal disorders using machine learning. For most variables assessed, individual clinician data entry positively correlated to the clinician group's data entry, indicating data is fit for reuse. However, from the clinician's perspective, there were inconsistencies, which could lead to data mistrust. When assessing data quality in EHR studies, it is crucial to engage clinicians with their deep understanding of EHR use, as improvement suggestions could be made. Clinicians should be considered local knowledge experts.


Asunto(s)
Exactitud de los Datos , Fisioterapeutas , Humanos , Registros Electrónicos de Salud , Conocimiento , Aprendizaje Automático
5.
Prev Vet Med ; 223: 106112, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38176151

RESUMEN

BACKGROUND: Temporal phenotyping of patient journeys, which capture the common sequence patterns of interventions in the treatment of a specific condition, is useful to support understanding of antimicrobial usage in veterinary patients. Identifying and describing these phenotypes can inform antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals, in which veterinarians have an important role to play. OBJECTIVE: This research proposes a framework for extracting temporal phenotypes of patient journeys from clinical practice data through the application of natural language processing (NLP) and unsupervised machine learning (ML) techniques, using cat bite abscesses as a model condition. By constructing temporal phenotypes from key events, the relationship between antimicrobial administration and surgical interventions can be described, and similar treatment patterns can be grouped together to describe outcomes associated with specific antimicrobial selection. METHODS: Cases identified as having a cat bite abscess as a diagnosis were extracted from VetCompass Australia, a database of veterinary clinical records. A classifier was trained and used to label the most clinically relevant event features in each record as chosen by a group of veterinarians. The labeled records were processed into coded character strings, where each letter represents a summary of specific types of treatments performed at a given visit. The sequences of letters representing the cases were clustered based on weighted Levenshtein edit distances with KMeans+ + to identify the main variations of the patient treatment journeys, including the antimicrobials used and their duration of administration. RESULTS: A total of 13,744 records that met the selection criteria was extracted and grouped into 8436 cases. There were 9 clinically distinct event sequence patterns (temporal phenotypes) of patient journeys identified, representing the main sequences in which surgery and antimicrobial interventions are performed. Patients receiving amoxicillin and surgery had the shortest duration of antimicrobial administration (median of 3.4 days) and patients receiving cefovecin with no surgical intervention had the longest antimicrobial treatment duration (median of 27 days). CONCLUSION: Our study demonstrates methods to extract and provide an overview of temporal phenotypes of patient journeys, which can be applied to text-based clinical records for multiple species or clinical conditions. We demonstrate the effectiveness of this approach to derive real-world evidence of treatment impacts using cat bite abscesses as a model condition to describe patterns of antimicrobial therapy prescriptions and their outcomes.


Asunto(s)
Antiinfecciosos , Mordeduras y Picaduras , Humanos , Animales , Absceso/veterinaria , Procesamiento de Lenguaje Natural , Amoxicilina , Mordeduras y Picaduras/veterinaria , Análisis por Conglomerados
6.
J Biomed Inform ; 145: 104464, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37541406

RESUMEN

OBJECTIVE: We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS: We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS: We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION: Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY: Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.


Asunto(s)
Enfermedad de Alzheimer , Descubrimiento del Conocimiento , Humanos , Descubrimiento del Conocimiento/métodos , Enfermedad de Alzheimer/diagnóstico , Redes Neurales de la Computación , Aprendizaje , Fenotipo
7.
J Biomed Inform ; 145: 104466, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37549722

RESUMEN

OBJECTIVE: With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities. METHODS: We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities. RESULTS: Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results. CONCLUSION: We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.


Asunto(s)
Benchmarking , Aprendizaje Automático , Humanos
8.
Med J Aust ; 219(3): 98-100, 2023 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-37302124
9.
J Clin Epidemiol ; 159: 58-69, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37120028

RESUMEN

OBJECTIVES: A major obstacle in deployment of models for automated quality assessment is their reliability. To analyze their calibration and selective classification performance. STUDY DESIGN AND SETTING: We examine two systems for assessing the quality of medical evidence, EvidenceGRADEr and RobotReviewer, both developed from Cochrane Database of Systematic Reviews (CDSR) to measure strength of bodies of evidence and risk of bias (RoB) of individual studies, respectively. We report their calibration error and Brier scores, present their reliability diagrams, and analyze the risk-coverage trade-off in selective classification. RESULTS: The models are reasonably well calibrated on most quality criteria (expected calibration error [ECE] 0.04-0.09 for EvidenceGRADEr, 0.03-0.10 for RobotReviewer). However, we discover that both calibration and predictive performance vary significantly by medical area. This has ramifications for the application of such models in practice, as average performance is a poor indicator of group-level performance (e.g., health and safety at work, allergy and intolerance, and public health see much worse performance than cancer, pain, and anesthesia, and Neurology). We explore the reasons behind this disparity. CONCLUSION: Practitioners adopting automated quality assessment should expect large fluctuations in system reliability and predictive performance depending on the medical area. Prospective indicators of such behavior should be further researched.


Asunto(s)
Reproducibilidad de los Resultados , Humanos , Estudios Prospectivos , Revisiones Sistemáticas como Asunto , Sesgo
10.
Int J Med Inform ; 173: 105021, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36870249

RESUMEN

INTRODUCTION: Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes. METHODS: Four tools were selected: three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance. RESULTS: Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives. CONCLUSION: Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.


Asunto(s)
Registros Electrónicos de Salud , Medicina General , Humanos , Confidencialidad , Anonimización de la Información , Australia , Procesamiento de Lenguaje Natural
11.
J Biomed Semantics ; 14(1): 1, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36721225

RESUMEN

BACKGROUND: Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE: In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS: We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS: We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS: We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION: N/A.


Asunto(s)
Algoritmos , Procesamiento de Lenguaje Natural , Bases de Datos Genéticas , MEDLINE , Aprendizaje Automático
12.
J Med Internet Res ; 25: e35568, 2023 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-36722350

RESUMEN

BACKGROUND: Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria. OBJECTIVE: We proposed a quality assessment task that provides an overall quality rating for each body of evidence (BoE), as well as finer-grained justification for different quality criteria according to the Grading of Recommendation, Assessment, Development, and Evaluation formalization framework. For this purpose, we constructed a new data set and developed a machine learning baseline system (EvidenceGRADEr). METHODS: We algorithmically extracted quality-related data from all summaries of findings found in the Cochrane Database of Systematic Reviews. Each BoE was defined by a set of population, intervention, comparison, and outcome criteria and assigned a quality grade (high, moderate, low, or very low) together with quality criteria (justification) that influenced that decision. Different statistical data, metadata about the review, and parts of the review text were extracted as support for grading each BoE. After pruning the resulting data set with various quality checks, we used it to train several neural-model variants. The predictions were compared against the labels originally assigned by the authors of the systematic reviews. RESULTS: Our quality assessment data set, Cochrane Database of Systematic Reviews Quality of Evidence, contains 13,440 instances, or BoEs labeled for quality, originating from 2252 systematic reviews published on the internet from 2002 to 2020. On the basis of a 10-fold cross-validation, the best neural binary classifiers for quality criteria detected risk of bias at 0.78 F1 (P=.68; R=0.92) and imprecision at 0.75 F1 (P=.66; R=0.86), while the performance on inconsistency, indirectness, and publication bias criteria was lower (F1 in the range of 0.3-0.4). The prediction of the overall quality grade into 1 of the 4 levels resulted in 0.5 F1. When casting the task as a binary problem by merging the Grading of Recommendation, Assessment, Development, and Evaluation classes (high+moderate vs low+very low-quality evidence), we attained 0.74 F1. We also found that the results varied depending on the supporting information that is provided as an input to the models. CONCLUSIONS: Different factors affect the quality of evidence in the context of systematic reviews of medical evidence. Some of these (risk of bias and imprecision) can be automated with reasonable accuracy. Other quality dimensions such as indirectness, inconsistency, and publication bias prove more challenging for machine learning, largely because they are much rarer. This technology could substantially reduce reviewer workload in the future and expedite quality assessment as part of evidence synthesis.


Asunto(s)
Aprendizaje Automático , Humanos , Revisiones Sistemáticas como Asunto , Sesgo
13.
J Allergy Clin Immunol ; 151(4): 943-952, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36587850

RESUMEN

BACKGROUND: The gut-lung axis is generally recognized, but there are few large studies of the gut microbiome and incident respiratory disease in adults. OBJECTIVE: We sought to investigate the association and predictive capacity of the gut microbiome for incident asthma and chronic obstructive pulmonary disease (COPD). METHODS: Shallow metagenomic sequencing was performed for stool samples from a prospective, population-based cohort (FINRISK02; N = 7115 adults) with linked national administrative health register-derived classifications for incident asthma and COPD up to 15 years after baseline. Generalized linear models and Cox regressions were used to assess associations of microbial taxa and diversity with disease occurrence. Predictive models were constructed using machine learning with extreme gradient boosting. Models considered taxa abundances individually and in combination with other risk factors, including sex, age, body mass index, and smoking status. RESULTS: A total of 695 and 392 statistically significant associations were found between baseline taxonomic groups and incident asthma and COPD, respectively. Gradient boosting decision trees of baseline gut microbiome abundance predicted incident asthma and COPD in the validation data sets with mean area under the curves of 0.608 and 0.780, respectively. Cox analysis showed that the baseline gut microbiome achieved higher predictive performance than individual conventional risk factors, with C-indices of 0.623 for asthma and 0.817 for COPD. The integration of the gut microbiome and conventional risk factors further improved prediction capacities. CONCLUSIONS: The gut microbiome is a significant risk factor for incident asthma and incident COPD and is largely independent of conventional risk factors.


Asunto(s)
Asma , Microbioma Gastrointestinal , Enfermedad Pulmonar Obstructiva Crónica , Adulto , Humanos , Estudios Prospectivos , Factores de Riesgo
14.
J Biomed Inform ; 139: 104293, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36682389

RESUMEN

Invasive fungal infections (IFIs) are particularly dangerous to high-risk patients with haematological malignancies and are responsible for excessive mortality and delays in cancer therapy. Surveillance of IFI in clinical settings offers an opportunity to identify potential risk factors and evaluate new therapeutic strategies. However, manual surveillance is both time- and resource-intensive. As part of a broader project aimed to develop a system for automated IFI surveillance by leveraging electronic medical records, we present our approach to detecting evidence of IFI in the key diagnostic domain of histopathology. Using natural language processing (NLP), we analysed cytology and histopathology reports to identify IFI-positive reports. We compared a conventional bag-of-words classification model to a method that relies on concept-level annotations. Although the investment to prepare data supporting concept annotations is substantial, extracting targeted information specific to IFI as a pre-processing step increased the performance of the classifier from the PR AUC of 0.84 to 0.92 and enabled model interpretability. We have made publicly available the annotated dataset of 283 reports, the Cytology and Histopathology IFI Reports corpus (CHIFIR), to allow the clinical NLP research community to further build on our results.


Asunto(s)
Infecciones Fúngicas Invasoras , Humanos , Infecciones Fúngicas Invasoras/epidemiología , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Factores de Riesgo
15.
J Med Internet Res ; 24(12): e38859, 2022 12 23.
Artículo en Inglés | MEDLINE | ID: mdl-36563029

RESUMEN

BACKGROUND: Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description. OBJECTIVE: We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes. METHODS: In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text. RESULTS: First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F1-score=0.62-0.64 vs F1-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two. CONCLUSIONS: Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously.


Asunto(s)
Lenguaje , Proyectos de Investigación , Humanos
16.
Health Inf Manag ; : 18333583221131753, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36374542

RESUMEN

BACKGROUND: The Australian hospital-acquired complication (HAC) policy was introduced to facilitate negative funding adjustments in Australian hospitals using ICD-10-AM codes. OBJECTIVE: The aim of this study was to determine the positive predictive value (PPV) of the ICD-10-AM codes in the HAC framework to detect hospital-acquired pneumonia in patients with cancer and to describe any change in PPV before and after implementation of an electronic medical record (EMR) at our centre. METHOD: A retrospective case review of all coded pneumonia episodes at the Peter MacCallum Cancer Centre in Melbourne, Australia spanning two time periods (01 July 2015 to 30 June 2017 [pre-EMR period] and 01 September 2020 to 28 February 2021 [EMR period]) was performed to determine the proportion of events satisfying standardised surveillance definitions. RESULTS: HAC-coded pneumonia occurred in 3.66% (n = 151) of 41,260 separations during the study period. Of the 151 coded pneumonia separations, 27 satisfied consensus surveillance criteria, corresponding to an overall PPV of 0.18 (95% CI: 0.12, 0.25). The PPV was approximately three times higher following EMR implementation (0.34 [95% CI: 0.19, 0.53] versus 0.13 [95% CI: 0.08, 0.21]; p = .013). CONCLUSION: The current HAC definition is a poor-to-moderate classifier for hospital-acquired pneumonia in patients with cancer and, therefore, may not accurately reflect hospital-level quality improvement. Implementation of an EMR did enhance case detection, and future refinements to administratively coded data in support of robust monitoring frameworks should focus on EMR systems. IMPLICATIONS: Although ICD-10-AM data are readily available in Australian healthcare settings, these data are not sufficient for monitoring and reporting of hospital-acquired pneumonia in haematology-oncology patients.

17.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36266246

RESUMEN

Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within and across databases, offers new opportunities to detect-or even correct-erroneous entries and more broadly to make inferences about record quality. Here, we describe this novel perspective of sequence database records as a rich network, which we call the sequence database network, and illustrate the opportunities this perspective offers for quantification of database quality and detection of spurious entries. We provide an overview of the relevant databases and describe how the interdependencies between sequence records across these databases can be exploited by network analyses. We review the process of sequence annotation and provide a classification of sources of error, highlighting propagation as a major source. We illustrate the value of a network perspective through three case studies that use network analysis to detect errors, and explore the quality and quantity of critical relationships that would inform such network analyses. This systematic description of a network perspective of sequence database records provides a novel direction to combat the proliferation of errors within these critical bioinformatics resources.


Asunto(s)
Biología Computacional , Bases de Datos de Ácidos Nucleicos , Secuencia de Aminoácidos
18.
Bioinformatics ; 38(22): 5026-5032, 2022 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-36124954

RESUMEN

MOTIVATION: Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with a lack of interpretability. More interpretable visible neural network architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. RESULTS: We propose a novel Mutated Pathway Visible Neural Network (MPVNN) architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar-sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that is important in risk prediction for particular cancer types, is reliable. AVAILABILITY AND IMPLEMENTATION: The data and code are available at https://github.com/gourabghoshroy/MPVNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , Fosfatidilinositol 3-Quinasas , Humanos , Fosfatidilinositol 3-Quinasas/genética , Proteínas Proto-Oncogénicas c-akt , Redes Neurales de la Computación , Neoplasias/genética , Mutación
19.
J Am Med Inform Assoc ; 29(10): 1810-1817, 2022 09 12.
Artículo en Inglés | MEDLINE | ID: mdl-35848784

RESUMEN

Electronic medical records are increasingly used to store patient information in hospitals and other clinical settings. There has been a corresponding proliferation of clinical natural language processing (cNLP) systems aimed at using text data in these records to improve clinical decision-making, in comparison to manual clinician search and clinical judgment alone. However, these systems have delivered marginal practical utility and are rarely deployed into healthcare settings, leading to proposals for technical and structural improvements. In this paper, we argue that this reflects a violation of Friedman's "Fundamental Theorem of Biomedical Informatics," and that a deeper epistemological change must occur in the cNLP field, as a parallel step alongside any technical or structural improvements. We propose that researchers shift away from designing cNLP systems independent of clinical needs, in which cNLP tasks are ends in themselves-"tasks as decisions"-and toward systems that are directly guided by the needs of clinicians in realistic decision-making contexts-"tasks as needs." A case study example illustrates the potential benefits of developing cNLP systems that are designed to more directly support clinical needs.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Toma de Decisiones Clínicas , Atención a la Salud , Humanos
20.
J Biomed Inform ; 133: 104149, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35878821

RESUMEN

One unintended consequence of the Electronic Health Records (EHR) implementation is the overuse of content-importing technology, such as copy-and-paste, that creates "bloated" notes containing large amounts of textual redundancy. Despite the rising interest in applying machine learning models to learn from real-patient data, it is unclear how the phenomenon of note bloat might affect the Natural Language Processing (NLP) models derived from these notes. Therefore, in this work we examine the impact of redundancy on deep learning-based NLP models, considering four clinical prediction tasks using a publicly available EHR database. We applied two deduplication methods to the hospital notes, identifying large quantities of redundancy, and found that removing the redundancy usually has little negative impact on downstream performances, and can in certain circumstances assist models to achieve significantly better results. We also showed it is possible to attack model predictions by simply adding note duplicates, causing changes of correct predictions made by trained models into wrong predictions. In conclusion, we demonstrated that EHR text redundancy substantively affects NLP models for clinical prediction tasks, showing that the awareness of clinical contexts and robust modeling methods are important to create effective and reliable NLP systems in healthcare contexts.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...